{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b4ea3722",
   "metadata": {},
   "source": [
    "## Document Loading\n",
    "\n",
    "Load a blog post on agents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4162356-c370-43d7-b34a-4e6af7a1e4c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install pdf2image"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ff363da",
   "metadata": {},
   "source": [
    "Load academic papers -"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "d79e191a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import arxiv\n",
    "from langchain.chat_models import ChatAnthropic\n",
    "from langchain.document_loaders import ArxivLoader, UnstructuredPDFLoader\n",
    "\n",
    "# Load a paper to use\n",
    "paper = next(arxiv.Search(query=\"Visual Instruction Tuning\").results())\n",
    "paper.download_pdf(filename=\"downloaded-paper.pdf\")\n",
    "loader = UnstructuredPDFLoader(\"downloaded-paper.pdf\")\n",
    "doc = loader.load()[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db964a34",
   "metadata": {},
   "source": [
    "Or try loading blog posts -"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "b2f10100",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.document_loaders import WebBaseLoader\n",
    "\n",
    "loader = WebBaseLoader(\"https://lilianweng.github.io/posts/2023-06-23-agent/\")\n",
    "text = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "361fcf5c",
   "metadata": {},
   "source": [
    "## Run template\n",
    "\n",
    "In `server.py`, set -\n",
    "```\n",
    "add_routes(app, chain_rag_conv, path=\"/summarize-anthropic\")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "3c673b53",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\" Here is my attempt to complete the requested tasks:\\n\\n<kindergarten_abstract>\\nThe scientists made a robot friend that can see pictures and talk about them. First they let a big talking robot make up fun conversations about pictures. Then they taught their robot friend to have conversations too. Their robot friend got pretty good at seeing pictures and talking about them!\\n</kindergarten_abstract>\\n\\n<moosewood_methods>\\nVisual Instruction Tuning\\nRecipe adapted from Haotian Liu et al.\\n\\nIngredients:\\n- 1 large language model (we used GPT-4)\\n- 158,000 images with text descriptions \\n- 1 vision model (we used CLIP)\\n- 1 large multimodal model (we used LLaMA)\\n\\nInstructions:\\n1. Have the GPT-4 model make up conversations, long descriptions, and complex stories about the images, using only the text descriptions as hints. \\n2. Connect the vision model and language model together into one big multimodal model. This will be your robot friend!\\n3. Show the robot friend examples of the conversations GPT-4 made up.\\n4. Let the robot friend practice having conversations about images, using what it learned from GPT-4.\\n5. Test how well it can see and talk about new images!\\n</moosewood_methods>\\n\\n<homer_results>\\nOh Muse, sing of scientists wise,  \\nWho built a robot with ears and eyes,  \\nTaught by a speaker most eloquent,  \\nThey molded silicon assistant.\\n\\nThis visual machine could see and tell  \\nOf images with knowledge it knew well.  \\nWhen humans speak in words precise,  \\nThe robot answers concise and wise.\\n\\nThough simpler than human true creation,  \\nIt showed the path for innovation.  \\nMore data, bigger models in time,  \\nMay lead to an assistant sublime.\\n</homer_results>\\n\\n<grouchy_critique>\\nOh great, another computer vision model. What is it this time, some overhyped multimodal beast trained on mountains of compute? Bah! Back in my day we built models with only a handful of pixels and a Commodore 64. And they worked just fine, not like these newfangled neural nets that gobble up GPUs like candy.\\n\\nLet's see, they got GPT-4 to hallucinate training data from captions? Well that hardly seems rigorous. Those mechaturkeys will fabricate anything if you let them. And they didn't even collect real human annotations? Training multimodal models is hard enough without using made-up data. \\n\\nI'm sure their LaVa MuDdLe model does decently on narrow tasks, but don't be surprised when it fails catastrophically in the real world. These things may seem clever, but they're as dumb as a bag of rocks under the hood. Mark my words, we should keep them far away from any mission critical systems, or we'll regret it!\\n\\nBah! Call me when we build real intelligence, not these statistical hackjobs.\\n</grouchy_critique>\""
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langserve.client import RemoteRunnable\n",
    "\n",
    "summarization = RemoteRunnable(\"http://localhost:8000/summarize-anthropic\")\n",
    "summarization.invoke({\"text\": doc.page_content})"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
